A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor
نویسندگان
چکیده
Abstract We address the efficient realization of matrix multiplication ( gemm ), with application in convolution operator for machine learning, RISC-V core present GreenWaves GAP8 processor. Our approach leverages BLIS (Basic Linear Algebra Instantiation Software) to develop an implementation that (1) re-organizes algorithm adapting its micro-kernel exploit hardware-supported dot product kernel GAP8; (2) explicitly orchestrates data transfers across hierarchy scratchpad memories via DMA (direct memory access); and (3) operates integer arithmetic.
منابع مشابه
Hardware accelerated approach for floating-point multiplication on 32-bit pipelined RISC-V processor
Implementing hardware support for all extensions of the RISC-V Instruction Set Architecture inside a processor would lead to avoidable area and power consumption for applications that rarely utilize a particular extension. In this paper, authors have first suggested a modified 3-stage pipeline alternative to the ZSCALE processor (32-bit) by UC Berkeley. Subsequently a hardware-accelerated appro...
متن کاملVector ISA Extension for Sparse Matrix-Vector Multiplication
In this paper we introduce a vector ISA extension to facilitate sparse matrix manipulation on vector processors (VPs). First we introduce a new Block Based Compressed Storage (BBCS) format for sparse matrix representation and a Block-wise Sparse Matrix-Vector Multiplication approach. Additionally, we propose two vector instructions, Multiple Inner Product and Accumulate (MIPA) and LoaD Section ...
متن کاملOptimizing Matrix-matrix Multiplication for an Embedded Vliw Processor
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...
متن کاملThe Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor
متن کامل
A fuzzy RISC processor
In this paper, we describe application-specific extensions for fuzzy processing to a general purpose processor. The application-specific instruction set extensions were defined and evaluated using hardware/software codesign techniques. Based on this approach, we have extended the MIPS instruction set architecture with only a few new instructions to significantly speed up fuzzy computation with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of Supercomputing
سال: 2022
ISSN: ['0920-8542', '1573-0484']
DOI: https://doi.org/10.1007/s11227-022-04581-6